Method of and An Operating Support System for Providing Performance Management in a Mobile Telecommunications System

ABSTRACT

In LTE, there is no control node that collects Performance Management (PM) data from base stations, like a Base Station Controller (BSC) or Radio Network Controller (RNC) does for 2G/3G systems. Instead, an Operating Support System (OSS) has to collect PM data directly from eNodeBs, thereby causing scalability issues. A method of improved PM for LTE networks uses the statistical counters defined in the eNodeB and counters created from elementary events and parameters of events. Specific counters or user-defined counters can be defined that are, for example, not traditionally implemented in the nodes, or that use events from additional Network Elements (NEs). The counter files and events are collected by an observation gateway or directly by the PM application, and are monitored in different time scales. The counters are also aggregated for different time periods, thereby providing scalability and time-based statistics for the counter values.

RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.13/509,077, filed 4 Sep. 2012, which was the National Stage ofInternational Application No. PCT/EP2009/066806, filed 10 Dec. 2009, thedisclosures of each of which are incorporated herein by reference intheir entirety.

TECHNICAL FIELD

The present invention relates to telecommunications and, in particular,to performance management in mobile telecommunications systems formonitoring and optimizing system operation.

BACKGROUND

Performance Management (PM) functions, in a mobile telecommunicationssystem, are used for monitoring, troubleshooting and optimization of themobile telecommunications system. The PM functions are based on eventsand counters generated by the several system elements of the mobiletelecommunications system, among which are radio access units, BaseStations (BS), radio network controllers, and other system nodes andservers.

Events are used to monitor and investigate elementary system operation.Relevant information of the operation of the system can be obtained froma long time observation of the events data.

Counters are used to obtain aggregate or statistical information of thesystem. Counters are implemented in the various system elements but canalso be created from events and event parameters. There is acontinuously increasing number of predefined counters that are recordedin the system elements.

Examples of recording events can be found in the General PerformanceEvent Handler (GPEH) and User Equipment Traffic Recording (UETR)functions of system elements. An example for collecting statisticalcounters is the STATS function of system elements in, for example, thirdgeneration (3G) and Long Term Evolution (LTE) mobile telecommunicationssystems. Note that events data and counter values generated by aplurality of system elements may be collected by a common server orgateway instead of by the system elements themselves.

The events data and counter values may be forwarded directly, inreal-time, to a management server or gateway that is part of anOperating Support System (OSS) by using a streaming application, forexample. However, the events and counters may also be collected in filesfor a set period of time called the Result Output Period (ROP) beforeforwarding thereof to the OSS. ROP files are retrieved periodically fromthe system elements and processed in the OSS. Both real-time events dataand counter values, as well as ROP files, may be available forprocessing by the OSS.

An OSS implements several types of PM functions such as trafficmonitoring, troubleshooting, radio and transport network optimization.From the processed events and counters Key Performance Indicators (KPIs)can be driven that are used for monitoring, troubleshooting and planningpurposes. KPIs are used for high level monitoring and business planningfunctions. These functions are not necessarily part of the OSS.

An OSS may also include applications for user defined counters to becreated from events or event combinations, which provides an extendedobservation possibility for telecommunications systems. Besides theabove, the events data and counter values may be used by otherapplications as well.

LTE networks, for example, implement a lot of auto-configurationfunctions and use default configurations which provide fast installationand stable operation in the initial phase of a systems setup. However,monitoring of the overall operation of the large number of systemelements requires a centralized management system. Exceptionalconditions and operations should be observed by a performance monitoringOSS. Performance monitoring tools and functions are also needed in orderto optimize the operation of LTE.

A problem of LTE performance monitoring is the plural numbers of nodesand cells, e.g., femtocells that have to be monitored. These nodes, alsocalled LTE eNodeB, generate larger numbers of events and counters thathave to be processed compared to previous mobile Radio Access Network(RAN) systems, such as GSM RAN, for example.

Scalability issues occur if the OSS has to communicate and collectevents data and counter values directly for large numbers of systemelements, such as the LTE eNodeBs. This is particularly true in theabsence of (intermediate) control nodes, which are also used forcollecting and pre-processing PM data from the system elements in GSMand WCDMA RAN systems.

Another problem with newly-deployed systems and system technology is thelack of reference data for the different parameters to derive KPIs, forexample. Current PM monitoring functions need a lot of prior knowledgeof the system, or decent operational experience, which means moreexpensive implementations and increased operating expenses. This is inparticular a problem for small system operators which are not able toinvest in the implementation of such tools and do not have largeexperienced staff for evaluation and system operation.

SUMMARY

It is an object of the present invention to provide performancemanagement functions for a mobile telecommunications system that can beimplemented in a central operating support system or as a separateperformance management tool, and adapted to collect and process pluralevents data and counter values generated by plural system elements.

It is another object of the present invention to provide an expert toolfor automatically extracting monitoring, troubleshooting andoptimization information from the performance management functions ofthe mobile telecommunications system, for use by a systems operator.

A first aspect comprises a method of system Performance Management (PM)by an Operating Support System (OSS) of a mobile telecommunicationssystem. The mobile telecommunications system comprising a plurality ofnodes and radio access units servicing a plurality of cells generating aplurality of operational events data and counter values measuredperiodically for a first Result Output Period (ROP). The methodcomprising the steps of: collecting events data and counter valuesoriginating from the nodes and radio access units; aggregating thecollected counter values periodically for a second and further ROPshaving a duration longer than the first ROP, wherein the first andsecond and further ROPs are set corresponding to a specific operationalevent and counter; creating further counter values from the collectedevents data periodically for the second and further ROPs; processing theaggregated and further counter values corresponding to the originatingnodes, radio access units and ROP; and analyzing the processed countervalues for providing system operational performance indicia in differenttime scales.

The aggregation of counter values for a second and further result outputperiods, i.e., for a second, third, fourth, etc. result output period,provides scalability and adequate time-based statistics for thecounters. For adequately identifying problems that occur on differenttime scales, the aggregation periods are set corresponding to specificevents and counters. That is, events and counters that relate to shortterm problems are aggregated for a correspondingly short result outputperiod and events and counters that relate to long term problems areaggregated for a correspondingly long result output period, for example.Note that some events and counters should be observed and aggregatedboth in short and long time periods.

Events data, which may include different parameters, are turned intofurther counter values that are not provided for by the counter valuesthat are directly collected from the system elements, i.e., the nodesand radio access units of the mobile communications system. By creatingsuch further counters for specific events periodically, corresponding toa respective set aggregation period or time scale for the specificevents, self-consistency of the performance data available as events andcounters in the system is maintained such that the aggregated andfurther counter values may be commonly processed in relation to thedifferent system elements for providing performance managementinformation in accordance with the time scale relevant for the specificinformation.

In this manner, larger numbers of events data and counter valuescompared to present performance management can be adequately handled andanalyzed, such as the large number of performance management datagenerated in LTE, for example.

The events and counters are monitored and aggregated for different timeperiods. In a further example, for specific operational events andcounters, different second and further ROPs are set corresponding totime periods related to usage of the mobile telecommunications system.The different second and further ROPs correspond to natural time periodsof human life and behavior in relation to periods of communicationtraffic change and traffic load of the telecommunications system, suchas five minutes, fifteen minutes, an hour, a day, a week, a month or ayear.

The size of the collected events data and counter values may be as largeas a few MBytes, which does not allow storing them for a long time. Theaggregation method makes it possible to store the information in anaggregated way for a longer time. Accordingly, in a further example, thecollected events data and counter values are stored for a period of timebeing a multiple of the respective second and further ROPs. For example,data aggregated for a period of 5 or 15 minutes need only to be storedfor a few hours. Data aggregated for 1 hour can be dropped after a fewdays, etc. It will be appreciated that this is a significant advantagein the efficient use of and the provision of storage capacity.

Processing of the aggregated and further counter values corresponding tothe originating nodes, radio access units and ROP comprises, among otherthings, parsing of the aggregated and further counter values andextracting counter values for each counter per cell and node and storingthem.

By creating counter value distributions for the extracted countervalues, an adequate spatial statistics base is provided, serving asreference data for future analysis of the events and counters. Thespatial statistics may be created for extracted counter values afterfiltering thereof with respect to set filter criteria relating to thecells and nodes of the telecommunications system. The filter criteria infact specify the scope of monitoring and are also used to decrease theamount of data to be processed. Input filtering for (a group of) cells,for example, enables different analysis for rural and urban areas, forexample. Counters that are not of interest can be excluded from theanalysis using the input filters as well. Filters can be added based onprior knowledge of the system or based on operational experience.

In an example, at least an average value and a standard deviation valueof the counter value distributions of the thus created spatialstatistics are calculated, among other things, to identify exceptionalcounter values. Outlier cells and nodes are identified, for example, bysorting counter values for different cells and nodes. By mapping causepatterns with the identified outlier cells and nodes, system operationalperformance indicia for the first and second and further ROPs areprovided. By correlating the spatial statistics with time-basedstatistics, more detailed results are derived.

The sensitivity of the OSS is tuned by settable factors f and g, suchthat outlier cells and nodes are automatically identified if a deviationfrom the average is larger than f times the standard deviation value andif a number of outlier counters for a same cell is larger than g.

That is, typical error cases can be identified, for monitoring,troubleshooting and optimization of the mobile telecommunicationssystem. The analysis is performed for different time scales, whichallows for identifying problems that may be visible in a short or in along time scale, for example.

Another aspect comprises an Operating Support System (OSS) for providingPerformance Management (PM) of a mobile telecommunications systemcomprising a plurality of nodes and radio access units for servicing aplurality of cells. The nodes and radio access units are arranged forgenerating a plurality of operational events data and counter valuesmeasured periodically for a first Result Output Period (ROP). The OSScomprising: a collecting unit, arranged for collecting events data andcounter values originating from the nodes and radio access units; anaggregating unit, arranged for aggregating the collected counter valuesperiodically for a second and further ROPs having a duration longer thanthe first ROP, wherein the first and second and further ROPs are set inrelation to a specific operational event and counter; a counter creatingunit, arranged for creating counter values from the collected eventsdata periodically for the second and further ROPs; and a processing andanalyzing unit, for processing the aggregated and further counter valuesin relation to the originating nodes, radio access units and ROP, andfor analyzing the processed counter values for providing systemoperational performance indicia in different time scales, including thefirst and second and further ROPs.

The OSS may be comprised by software, hardware or a combination ofsoftware and hardware in a single node of a telecommunications system,by a plurality of collaborating nodes and even by a server, gateway orcomputer processing unit external to the telecommunications system.

In an embodiment, the OSS comprises a unit for setting different secondand further ROPs, wherein the aggregating unit and counter creating unitare arranged for operating with set different second and further ROPs.

In a further embodiment, the processing and analyzing unit comprises aparser arranged for parsing the aggregated and further counter valuesfor extracting counter values for each counter per cell and node, astorage unit arranged for storing the extracted counter values, a filterunit arranged for filtering extracted counter values with respect to setfilter criteria relating to the cells and nodes, and a spatialstatistics unit arranged for calculating spatial statistics comprisingcounter value distributions including an average value and a standarddeviation value of the counter value distributions.

For analyzing spatial statistics for performance management purposes, inanother embodiment the processing and analyzing unit further comprises asorter unit, arranged for sorting counter values for different cells andnodes and for identifying outlier cells and nodes based on thecalculated average value and standard deviation value of the sortedcounter values.

The processing and analyzing unit may further comprises a mapping unit,arranged for mapping cause patterns with the identified outlier cellsand nodes, and a presentation unit for presenting system operationalperformance indicia based on this mapping.

The above-mentioned and other features and advantages of the inventionwill be best understood from the following description referring to theattached drawings. In the drawings, like reference numerals denoteidentical parts or parts performing an identical or comparable functionor operation.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows, in a very schematic and illustrative manner, a basicarchitecture of a mobile telecommunications system comprising anOperating Support System (OSS).

FIG. 2 shows, in a schematic and illustrative manner, an example of anOSS in accordance with the present invention.

FIG. 3 shows, in a schematic and illustrative manner, an example of theBSO of FIG. 2, in accordance with the present invention.

FIG. 4 illustrates an example of the method according to the invention.

DETAILED DESCRIPTION

The present invention will now be illustrated by way of example and notby way of limitation in a Radio Access Network (RAN) 2 of a mobilecommunications system 1, such as a Global System for Mobilecommunications (GSM), a General Packet Radio Service (GPRS), WidebandCode Division Multiple Access (WCDMA), Time Division-Synchronous CodeDivision Multiple Access (TD-SCDMA) RAN, or a Long Term Evolution (LTE)mobile telecommunications system supporting communication with mobileUser Equipment (UE) 3 connecting via a wireless radio link 4 and radioaccess units 5 to the RAN 2. The RAN 2 comprises several node andservers arranged as Radio Network Controller (RNC) 6 for supporting thecommunication with switching nodes, such as a Mobile Switching Center(MSC) 8 and/or servers of an Internet Protocol (IP) Multimedia Subsystem(IMS) 9 which operatively connect 10 to the RAN 2.

The radio access units 5, called Radio Base Stations (RBSs) in a GSMsystem, for example, and eNodeBs in an LTE communications system, forexample, provide service to UEs 3 in a restricted geographical area,called a cell 15, and connect operatively 7 to the RAN 2 for exchangingcalls and data between the different UE 3 and other subscribers andusers of the telecommunications system 1.

For the purpose of the present invention, the manner of call handlingand data exchange between the several subscribers and users of thetelecommunications system is not of importance, such that this will notbe further detailed herein. Further, this is knowledge which is fullywithin the reach of the person skilled in the art.

For the present invention it is important that several radio accessunits 5 and RNCs 6 of the telecommunications system 1, and in particularthe RAN 2, are communicatively connected to a central management orOperating Support System (OSS) 11 of the telecommunications system 1,illustratively indicated by dashed lines 12. The connections 12 may, forexample, be streaming connections, for the real-time streaming of eventsdata and counter values to the OSS 11.

In the present description and claims, the radio access units 5 and RNCs6 are also termed system elements (SEs). The SEs 5, 6 generate internaland external events about their operation. Each event may include one ormore parameters that are linked to the event. In the SEs 5, 6 severalcounters are implemented to obtain aggregate or statistical informationof the system. In FIG. 1, these events and counters are schematicallyindicated by reference numerals 13 and 14, respectively. The countervalues are measured by the SEs 5, 6 periodically for a time period, thefirst Result Output Period (ROP), and the result is stored in ROP files.A basic first ROP is typically 5-15 minutes and is set corresponding toa specific counter. Note that the RNC nodes 6 may collect and store theevents and most of the counters related to the controlled radio accessunits 5. The ROP files and/or the events from the SEs 5, 6 are forwardedto the OSS 11.

FIG. 2 shows schematically an example of an OSS 20 in accordance withthe present invention. The OSS 20 includes several PerformanceManagement (PM) functions performed by a data collecting unit, called aPerformance Management Gateway (PMG) 21, a counter creating unit, calledEvent Based Applications (EBA) 22, an aggregating unit, called CounterAggregation (CA) 23, and a processing and analyzing unit, called BulkStatistical Observation (BSO) 24. The OSS 20 further comprises a unit 25for setting different second and further ROPs, i.e. third, fourth,fifth, etc. ROPs, for use by the EBA 22 and the CA 23.

The events 13 generated by the several radio access units 5 and RNC 6 asshown in FIG. 1 are either forwarded to the PMG 21 or to the other PMfunctions 22, 23, 24 of the OSS 20.

The counter values 14 and basic first ROP are collected by the PMG 21and are a primary input of the CA 23. Here the input counter values andROP files are stored for a time period and periodically aggregated for asecond and further ROPs having a duration longer than the first ROP. Inthe unit 25, the ROP are set in relation to specific operational eventsand counters. The second and further ROPs are created for natural timeperiods, which correspond to the periods of human life and behavior inrelation to the use of the telecommunications system 1, for example ROPsof 1 hour, 1 day, 1 week, 1 month, 1 year. Basic ROPs of 5 or 15 minutesare, for example, aggregated for 1 hour. The 1 hour ROPs are aggregatedfor 1 day periods. The 1 day ROPs are aggregated for 1 week periods,etc. It is assumed that these periods correspond to the periodic changeof the traffic volume and composition, i.e., speech, multimedia data,internet related data, metering data, etc. The CA 23 provides severalROP files as input to the BSO 24.

The EBA function 22 of the OSS 20 creates counters from the events andparameters included in the events collected by the PMG 21, for the sametime periods, i.e., the second and further ROPs as in the CA 23. In theEBA function 22 user defined counters can be specified, that are notimplemented in the SEs 6, 7, for example. The EBA function 22 can alsobe used to define counters or multiple events from different SEs 6, 7.The thus created counter values are input to the BSO 24.

The BSO operational units and functions are displayed in FIG. 3. BSO 24receives the first or basic ROP files 29 and the aggregated or secondand further ROPs files 26, 27, 28, as input. A parser unit 30 parses thecounter files 26, 27, 28, 29 and extracts the counter values for eachactivated counter per cell 15. The data are stored in a storage unit 31,such as a RAM or a database. The data are stored in the storage unit 31in order make the data available for historical analysis. This makes itpossible, for example, to compare actual collected and processed eventsand counter data with similar data of previous time periods. Dependingon the size of the data and available DB capacity 10-100 ROP files arestored per accumulation periods ROP.

The data are further applied to a filter unit 32, thereby specifying thescope of monitoring and for decreasing the amount of data to beprocessed. The filtering can be performed for a group or groups of cells15, to prepare different analysis for rural and urban areas, forexample. Counters that are not of interested for a particular analysisshould be excluded from such analysis for which the filter unit 32 canbe applied as well. By default all cells and parameters are included.Filters can be added based on prior knowledge of the system or based onoperation experience, for example.

For each counter that passes the filter unit 32, spatial statistics arecreated by a spatial statistics unit 33, which means that a statisticaldistribution of the counter values is created for the cells 15, or SEs5, 6. From these statistics, the average and standard deviation arecalculated for the different time scales, i.e., the first and second andfurther ROPs. Other quantities that characterize the distribution mayalso be obtained.

The counter values for the different cells are sorted per counter andoutlier cells are identified by a sorter unit 34. The counter value isan outlier, for example, if the deviation from the average is largerthan a factor f times the standard deviation value, where f=3 or anothervalue to be set. Accordingly the value of f is used to control thenumber of outliers.

Another factor g to be set is the number of outlier counters for thesame cell. For example, if the value of g>5 there are more than 5parameters that have an extraordinary value and they should belong tothe same root cause or root causes.

The parameters f and g are used to tune the sensitivity of the OSSsystem 20. The actual values of the factors f and g may depend from thetype and size of the mobile telecommunications system and/or theoperator, for example.

Next, cause patterns are fitted to the outlying counters by a mappingunit 35, which helps the operator to identify the problem and the rootcause of the problem. If, for example, call drop or data packet loss fora certain service is high in a particular cell 15 and at the same timethe signal strength level is relatively low, it can be concluded thatthe drop or loss is due to the weak signal. Predefined cause patternscan be provided with the mapping function, however the mapping functionand unit 35 provide the possibility to add new patterns by the operatorbased on operation experience.

The analysis is done for different time scales, i.e., different ROP, asillustrated by the several rows of arrows corresponding to a particularROP file 26, 27, 28, 29. This makes it possible to observe differenttypes of problems. A problem may be identified only at smaller timescale. On the other hand in short time scale there can be many outliersthat make it difficult to identify the problem. Long scale observationmay identify problems that are due to long time traffic increase, agingof connectors, devices, etc., that are not visible in short time scale.This makes it possible to investigate the history of the values ofdifferent parameters.

This analysis assumes that the majority of the cells and SEs operatewell. This is also a necessary criterion for a self-configuration systemlike LTE, where automating functions ensure the proper settings andoperation. The results may be presented in different ways. For each timescale the cells that have outlying counters are presented and listed. Inanother view the relevant counters can be selected, or they are selectedautomatically as the counters having outlying values and thecorresponding cells are indicated. These performance indicia arepresented by the presentation unit 36 of the processing and analyzingunit 24.

FIG. 4 illustrates in a flow chart type diagram the steps of an exampleof performing the method of the invention, with reference to the atelecommunications system as outlined by FIG. 1 and measuredperiodically for a first Result Output Period (ROP).

In a first step 40, collecting events data and counter valuesoriginating from said nodes 6 and radio access units 5 are collected. Ina second step 41, the collected counter values are periodicallyaggregated for a second and further ROPs having a duration longer thanthe first ROP, wherein the first and second and further ROPs are setcorresponding to a specific operational event and counter. In a furtherstep 42 counter values are created periodically for the second andfurther ROPs from the collected 40 events data. By processing theaggregated and further counter values in step 43 corresponding to theoriginating nodes 6, radio access units 5 and ROP, and analyzing 44 theprocessed counter values, system operational performance indicia indifferent time scales are provided 44.

Further examples of the method are elucidated above with reference tothe FIGS. 2 and 3.

With the present invention, an operator becomes a good overview of thesystem and network operation, which includes all available counters. Itcan be used for automatic monitoring and trouble-shooting as well. Itprovides a centralized performance monitoring method, i.e., operatorscan avoid complex drive tests, etc., in order to obtain a full pictureof the system operation. The method and OSS system provided are able tomonitor all available counters in a mobile communications system, aswell as user-defined counters.

The invention makes use of the self-consistency of data instead ofpredefined thresholds and is self-adapting to different systemdeployment scenarios, traffic conditions, etc. There is no need forspecific system or network knowledge to use the present performancemanagement tool. On the other hand it has the flexibility to add suchknowledge to processing and analyzing unit BSO 24, for example, ifrequired.

The result is analyzed in different time scales, therefore it ispossible to notice errors that occur slowly, e.g., due to oxidation ofconnectors, as well as short temporary problems like large trafficbursts, packet delays, variations in packet delays, etc.

The present invention is not limited to the embodiments as disclosedabove, and can be modified and enhanced by those skilled in the artbeyond the scope of the present invention as disclosed in the appendedclaims without having to apply inventive skills.

What is claimed is:
 1. A method of system performance management by an Operating Support System (OSS) of a mobile telecommunications system, the mobile telecommunications system comprising a plurality of nodes and radio access units servicing a plurality of cells generating a plurality of operational events data and counter values measured periodically for a first Result Output Period (ROP) the method comprising: collecting events data and counter values originating from the nodes and radio access units, the events data corresponding to events; aggregating the collected counter values periodically for a second ROP and further ROP; wherein the first ROP, second ROP, and further ROP correspond to a specific operational event and counter; wherein the second and further ROPs have a duration longer than the first ROP; creating further counter values from the collected events data periodically for the second and further ROPs; processing the aggregated and further counter values; analyzing the processed counter values for providing system operational performance indicia in different time scales.
 2. The method of claim 1, wherein the second and further ROPs correspond to time periods related to usage of the mobile telecommunications system.
 3. The method of claim 1, further comprising storing the collected events data and counter values for a period of time being a multiple of the respective second and further ROPs.
 4. The method of claim 1: wherein the events include parameters; wherein the creating further counter values comprises creating further counter values for different parameters of an event.
 5. An Operating Support System (OSS) for providing performance management of a mobile telecommunications system, the mobile telecommunications system comprising a plurality of nodes and radio access units for servicing a plurality of cells and arranged for generating a plurality of operational events data and counter values measured periodically for a first Result Output Period (ROP), the OSS comprising: a collecting circuit configured to collect events data and counter values originating from the nodes and radio access units; an aggregating circuit configured to aggregate the collected counter values periodically for a second ROP and further ROP; wherein the first ROP, second ROP, and further ROP correspond to a specific operational event and counter; wherein the second and further ROPs have a duration longer than the first ROP; a counter creating circuit configured to create counter values from the collected events data periodically for the second and further ROPs; a processing circuit configured to: process the aggregated and further counter values in relation to the originating nodes, radio access units, and ROP; analyze the processed counter values for providing system operational performance indicia in different time scales, including the first and second and further ROPs.
 6. The OSS of claim 5, further comprising a setting circuit configured to set the different second and further ROPs; wherein the aggregating circuit and counter creating circuit are arranged for operating with set different second and further ROPs.
 7. The OSS of claim 6, wherein the setting circuit is further configured to set the second and further ROPs to time periods related to usage of the mobile telecommunications system.
 8. The OSS of claim 5, further comprising a storage circuit configured to store the collected events data and counter values for a period of time being a multiple of the respective second and further ROPs. 